Abstraction

This dataset is the gene expression profiling of MDA231, BT549 and SUM159PT celles after selumetinib treatment or DUSP4 siRNA knockdown. MDA231, BT549 and SUM159PT basal-like breast cancer cell lines were transfected with non-targeting siRNA (siCONTROL), siRNA targeting DUSP4 (siDUSP4), or siCONTROL + 4 or 24 hr of 1uM selumetinib. The data were log2 RMA normalized.

This dataset has 36 samples which can be separated to 3 groups by different cell lines (MDA231, SUM159PT, BT549). Each cell line has 12 samples which can be separated to 2 groups by different drug treatments (DMSO, Selumetinib). Each drug treatment has 3 controls and 3 cases.

Introduction

Basal-like breast cancer (BLBC) is an disease that has less clinically approved targeted therapy. This research focus on the dual specificity phosphatase-4 (DUSP4) is a negative regulator of the activation of the mitogenactivated protein kinase (MAPK) pathway that is deficient in BLBCs treated with chemotherapy. This paper investigated how DUSP4 regulates the MAP-ERK kinase (MEK) and c-jun-NH2-kinase (JNK) pathways in modifying cancer stem cell like behavior. This research support the MEK and JNK pathways inhibitors are therapeutic agents in basal-like breast cancer to eliminate the cancer stem cell population.
The paper introduces several methods to collect the resources and manipulate the data. They used microarrays as a tool that cells were harvested 96 hours after setting up all the control and case samples. For the statistical analysis, linear regression, ANOVA and the Student t tests are used which can be found from the original paper. Student t test was used for two groups analysis, multiple group analysis was conducted by ANOVA with Tukeypost hoc analyses.

The dataset has total 36 sample with 3 different cell lines, MDA231, BT549 and SUM159PT. Each cell line has 2 different drug treatments which are siDUSP4 and selumetinib and contains 12 samples. Every 6 samples are in a group with 3 CONTROLS and 3 CASES. In the following statistics, dataset has two characteristics, one is separated by CONTROL and CASE, the other is separated by different cell lines. The linear regression model design was based on these two characteristics.

Visualization

This is the Boxplot of the original data.

The boxplot shows the data after normalization, log2 ratio and cpm were used.

Boxplot from original data, blue color represents the CONTROL samples, and the purple color represents the CASE.

Plot of GSE41816 database by using quantile normalization.

QQ-Plot of GSE41816 database.

This is the density plot by using log2 ratio for the original data.

This the density plot after using cpm function.

MDS plot shows the relation between CONTROL and CASE. The relation between CONTROL and CASE are pretty close. The most differences are between differet cell types.

## [1] 947
## [1] 292
## [1] 613
## [1] 334

The glmQLFTest from edgeR package is based on the cell line MDA231. There are 947 genes pass the threshold p-value which less than 0.05. There are 292 genes pass correction, 613 genes are up regulated and 334 genes are down regulated.

##                     logFC   logCPM        F       PValue          FDR
## ENSG00000166825 -1.325501 6.598318 4881.474 1.953507e-10 3.594062e-06
## ENSG00000148677  1.290621 6.270545 3539.259 1.307444e-09 7.207435e-06
## ENSG00000164176  1.359398 6.437174 2132.666 1.394339e-09 7.207435e-06
## ENSG00000171345  1.312254 6.168467 3735.305 1.567004e-09 7.207435e-06
## ENSG00000138685 -1.354013 6.291498 2052.621 1.980686e-09 7.288132e-06
## ENSG00000178860 -1.321900 6.311231 1592.114 2.913912e-09 7.611416e-06
##                 X.GSM1024692. X.GSM1024693. X.GSM1024694. X.GSM1024695.
## ENSG00000166825      5.281563      4.937310      4.502223      4.299639
## ENSG00000148677     11.520560     11.367750     11.387870     12.188910
## ENSG00000164176     10.715520     10.619550     10.501440      9.942396
## ENSG00000171345     11.352290     11.467580     11.412640     10.544090
## ENSG00000138685      4.199064      3.843173      3.583468      4.350140
## ENSG00000178860      4.589334      3.993551      4.157411      4.164770
##                 X.GSM1024696. X.GSM1024697. X.GSM1024698. X.GSM1024699.
## ENSG00000166825      4.562559      4.559133      4.820214      5.217629
## ENSG00000148677     12.125530     12.191380     11.200870     11.279510
## ENSG00000164176      9.703603      9.920992     10.741190     10.393960
## ENSG00000171345     10.698020     10.632230     11.447330     10.844630
## ENSG00000138685      4.513221      3.639465      3.965435      5.034817
## ENSG00000178860      4.311838      3.681640      4.030405      4.623116
##                 X.GSM1024700. X.GSM1024701. X.GSM1024702. X.GSM1024703.
## ENSG00000166825      5.249080      4.782530      4.612532      5.054136
## ENSG00000148677     11.459280     11.416100     11.463730     11.433180
## ENSG00000164176     10.608210     11.461900     11.492240     11.427310
## ENSG00000171345     11.158700     11.523170     11.562280     11.389000
## ENSG00000138685      3.571180      3.754532      3.848157      4.121012
## ENSG00000178860      4.094188      3.564721      4.586854      4.581983
##                 X.GSM1024704. X.GSM1024705. X.GSM1024706. X.GSM1024707.
## ENSG00000166825     12.004110     11.962360     11.885780     12.016830
## ENSG00000148677      6.104874      6.141503      6.114605      5.725344
## ENSG00000164176     10.917870     10.884730     10.943350     10.563790
## ENSG00000171345      4.873714      4.409018      4.566517      4.259648
## ENSG00000138685      7.435131      7.418010      7.649821      8.264706
## ENSG00000178860      7.780212      7.622669      7.714523      7.107294
##                 X.GSM1024708. X.GSM1024709. X.GSM1024710. X.GSM1024711.
## ENSG00000166825     12.013260     11.926180     12.034530     12.062970
## ENSG00000148677      5.833907      5.701848      5.767415      5.554355
## ENSG00000164176     10.546890     10.606350     10.764500     10.889470
## ENSG00000171345      4.138494      4.003452      4.702843      4.733443
## ENSG00000138685      8.281728      8.080848      7.641626      7.689951
## ENSG00000178860      7.585727      7.459564      7.955678      8.045712
##                 X.GSM1024712. X.GSM1024713. X.GSM1024714. X.GSM1024715.
## ENSG00000166825     11.963400     12.059590     12.118400     12.061240
## ENSG00000148677      5.584886      5.720542      5.260229      5.900916
## ENSG00000164176     10.736000     10.973390     10.737780     10.859430
## ENSG00000171345      4.729569      4.536699      4.353832      4.975224
## ENSG00000138685      7.729908      7.897225      7.910603      7.991155
## ENSG00000178860      7.846854      8.678860      8.241144      8.573381
##                 X.GSM1024716. X.GSM1024717. X.GSM1024718. X.GSM1024719.
## ENSG00000166825     12.520290     12.405170     12.454030     12.388500
## ENSG00000148677      4.828815      4.317569      4.599101      4.942575
## ENSG00000164176      3.941980      4.238735      4.526144      4.284027
## ENSG00000171345      4.741997      4.374214      4.303652      4.394619
## ENSG00000138685     10.583470     10.537690     10.584110     10.824450
## ENSG00000178860     10.820920     11.010890     10.939960      9.956842
##                 X.GSM1024720. X.GSM1024721. X.GSM1024722. X.GSM1024723.
## ENSG00000166825     12.410740     12.389160     12.444240     12.439400
## ENSG00000148677      4.842864      4.845449      4.700742      4.661554
## ENSG00000164176      4.660891      4.195507      4.089976      3.712907
## ENSG00000171345      4.525424      4.203963      4.522422      4.564958
## ENSG00000138685     10.843160     10.848110     10.589870     10.467830
## ENSG00000178860     10.040360     10.090260     11.143030     11.195160
##                 X.GSM1024724. X.GSM1024725. X.GSM1024726. X.GSM1024727.
## ENSG00000166825     12.384330     12.250760     12.491760     12.457070
## ENSG00000148677      4.981931      4.183184      4.678524      5.022226
## ENSG00000164176      3.578350      3.727744      4.292710      4.114725
## ENSG00000171345      4.532353      4.249921      4.680730      4.616951
## ENSG00000138685     10.521600     10.465490     10.690850     10.609710
## ENSG00000178860     11.307790     11.036660     11.094830     11.062190

This is part of the table contents after quasi linear fit and calculating the p-values. These two tables return the top hits which ranked by p-values and the corresponding original data.

## [1] "ENSG00000117602" "ENSG00000175793" "ENSG00000162599" "ENSG00000184588"
## [5] "ENSG00000099260" "ENSG00000134247"

Extract differential expressed genes, there is a part of genes names showing above. These two plots are used to visualize the amount of differentially expressed genes.

The heatmap contains the top hits which p-value less than 0.05 differential expression genes that calculated by quasi-likelihood.

##                    ID      logFC   AveExpr         t      P.Value    adj.P.Val
## 5781  ENSG00000187720  0.7370299  7.479091  8.640918 1.120608e-10 2.061695e-06
## 5581  ENSG00000080824  0.3500389 11.425553  6.896796 2.651832e-08 1.427723e-04
## 1239  ENSG00000157193  0.4629367  9.159892  6.874071 2.852303e-08 1.427723e-04
## 12614 ENSG00000144821  0.3827667  5.279393  6.796637 3.657034e-08 1.427723e-04
## 10740 ENSG00000172296 -0.4563264  8.084559 -6.735349 4.453032e-08 1.427723e-04
## 12928 ENSG00000145147  0.4142466  9.335957  6.721477 4.656125e-08 1.427723e-04
##               B
## 5781  13.694997
## 5581   8.797634
## 1239   8.731628
## 12614  8.506416
## 10740  8.327834
## 12928  8.287376
## [1] 2670
## [1] 480

The table is the sample output for lmFit linear regression. There are 2670 genes pass the threshold which less than 0.05. 480 genes pass the correction.

The heatmap contains the top hits which p-value less than 0.05 differential expression genes that calculated by lmFit.

This plot is used to compare Quasi-likelihood model and limma model.

Fig. 1. Balko, J. M., Schwarz, L. J., Bhola, N. E., Kurupi, R., Owens, P., Miller, T. W., . Arteaga, C. L. (2013, October 15). Retrieved from Activation of MAPK pathways due to DUSP4 loss promotes cancer stem cell-like phenotypes in basal-like breast cancer.

Fig. 1. Balko, J. M., Schwarz, L. J., Bhola, N. E., Kurupi, R., Owens, P., Miller, T. W., . Arteaga, C. L. (2013, October 15). Retrieved from Activation of MAPK pathways due to DUSP4 loss promotes cancer stem cell-like phenotypes in basal-like breast cancer.

The result p-values of ANOVA and a two-tailed Student t test are provided from the aboving picture. The p-value from picture B is the result of ANOVA. The p-value from picture E is the result of a two-tailed Student t test.

Fig. 5. Balko, J. M., Schwarz, L. J., Bhola, N. E., Kurupi, R., Owens, P., Miller, T. W., . Arteaga, C. L. (2013, October 15). Retrieved from Activation of MAPK pathways due to DUSP4 loss promotes cancer stem cell-like phenotypes in basal-like breast cancer.

Fig. 5. Balko, J. M., Schwarz, L. J., Bhola, N. E., Kurupi, R., Owens, P., Miller, T. W., . Arteaga, C. L. (2013, October 15). Retrieved from Activation of MAPK pathways due to DUSP4 loss promotes cancer stem cell-like phenotypes in basal-like breast cancer.

Microarray analysis was conducted on RNA derived from MDA231, BT549 and SUM159PT cells with treatments siCONTROL or siDUSP4 and 4h or 24h of selumetinib. Picture A is the heatmap of significantly altered genes from MDA231 cells.

Thresholded Analysis

After having up-regulated and down-regulated set of genes, using g:Profiler to analysis those gene lists separately.
Benjamini-Hochberg FDR significance threshold was used. For the data sources, GO molecular function, cellular component and biological process, Reactome, WikiPathways and all the regulatory motifs in DNA, all the protein databases and human phenotype ontology were all selected. Reduced the result sample size from 1000 to 500.

up-regulated gene list result

Figure 1. Full visualization of up-regulated genes Figure 2. Full table contents of up-regulated genes Figure 3. Part of table contents GO:MF and GO:BP Figure 4. Part of table contents REAC and WP GO:BP - bundle of His cell to Purkinjemyocyte communication GO:0086069
REAC - Platelet activation, signaling and aggregation REAC:R-HSA-76002
WP - Deregulation of Rab and Rab Effector Genes in Bladder Cancer WP:WP2291

down-regulated gene list result

Figure 5. Full visualization of down-regulated genes Figure 6. Full table contents of down-regulated genes Figure 7. Part of table contents GO:MF and GO:BP Figure 8. Part of table contents REAC and WP GO:BP - skeletal system morephogenesis GO:0048705
REAC - Extracellular matrix organization REAC:R-HSA-1474244
WP - Tryptophan catabolism leading to NAD + production WP:WP4210

After comparing these two g:Profiler results, the down-regulated gene lists contains more information than the up-regulated gene list. Corresponding to the paper, DUSP4 expression downregulates expression which decrease the cancer stem cell population.

Reference

  1. Balko, J. M., Schwarz, L. J., Bhola, N. E., Kurupi, R., Owens, P., Miller, T. W., . Arteaga, C. L. (2013, October 15). Activation of MAPK pathways due to DUSP4 loss promotes cancer stem cell-like phenotypes in basal-like breast cancer. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/23966295
  2. Data retrieved from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE41816
  3. hugene11sttranscriptcluster.db. (n.d.). Retrieved from http://bioconductor.org/packages/release/data/annotation/html/hugene11sttranscriptcluster.db.html
  4. Davis, S. and Meltzer, P. S. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics, 2007, 14, 1846-1847
  5. Orchestrating high-throughput genomic analysis with Bioconductor. W. Huber, V.J. Carey, R. Gentleman, …, M. Morgan Nature Methods, 2015:12, 115.
  6. Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47.
  7. Robinson MD, McCarthy DJ and Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140
  8. McCarthy DJ, Chen Y and Smyth GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297